library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(ggplot2)
sales <- read.csv("~/Downloads/sales.csv")
sales_ts <- ts(sales$Sales, start = c(2020,1) , end = c(2024,8), frequency = 12)
plot(sales_ts, main='Monthly sales for the US pharma company', xlab ='Year', ylab='Sales')
The time series plot shows upward trend from 2020 to mid 2023, indicating growth in sales. Around mid 2023, the sales reach a peak followed by some decline and fluctuations. There is fluctuation each year, suggesting seasonal pattern. This could be due to factors such as seasonal demand or production cycles.
Acf(sales_ts)
The ACF plot shows autocorrelation at multiple lags, gradually decreasing as the lags increase. Meaning sales at one time period are correlated with sales in future periods. There is trend and seasonality.
summary(sales_ts)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1322 2892 7246 8753 14346 22397
boxplot(sales_ts, main ="Boxplot for sales")
Based on the summary of the sales, with values from 1322 to 22397. The mean (8753) is higher than the median (7246), meaning data is right skewed, indicating occasional high sales months that higher up the average.
The data contains a few high sales values, shown by the upper range of the box plot. The most of sales fall within the range of 5000-15000. There are no outliers.
decompose_sales <- decompose(sales_ts)
plot(decompose_sales)
stl_sales <- stl(sales_ts, s.window = "periodic" )
plot(stl_sales)
Yes, the time series is seasonal. There is recurring pattern in sales each year which confirms the seasonality.
The decomposition is additive, the seasonal changes stay about the same over time, meaning their size does not change in proportion to the trend level.
seasonal_indices <- stl_sales$time.series[,"seasonal"]
print(seasonal_indices)
## Jan Feb Mar Apr May Jun
## 2020 -1316.0439 -1310.8511 1131.7504 -290.8039 1070.6495 1647.5391
## 2021 -1316.0439 -1310.8511 1131.7504 -290.8039 1070.6495 1647.5391
## 2022 -1316.0439 -1310.8511 1131.7504 -290.8039 1070.6495 1647.5391
## 2023 -1316.0439 -1310.8511 1131.7504 -290.8039 1070.6495 1647.5391
## 2024 -1316.0439 -1310.8511 1131.7504 -290.8039 1070.6495 1647.5391
## Jul Aug Sep Oct Nov Dec
## 2020 746.8760 -1631.1533 1038.7615 -1294.8648 -1597.4789 1805.6200
## 2021 746.8760 -1631.1533 1038.7615 -1294.8648 -1597.4789 1805.6200
## 2022 746.8760 -1631.1533 1038.7615 -1294.8648 -1597.4789 1805.6200
## 2023 746.8760 -1631.1533 1038.7615 -1294.8648 -1597.4789 1805.6200
## 2024 746.8760 -1631.1533
month_high <- which.max(seasonal_indices)
month_low <- which.min(seasonal_indices)
print(paste("Month with highest seasonal value:", month_high))
## [1] "Month with highest seasonal value: 12"
print(paste("Month with lowest seasonal value:", month_low))
## [1] "Month with lowest seasonal value: 8"
High Seasonal Values Month - December
Low Seasonal Values Month - August
Sales may high in winter months like December, due to a rise in seasonal illnesses like colds and flu, which could lead high sales for healthcare products. Where in summer people are less likely to fall ill.
seasadj(stl_sales)
## Jan Feb Mar Apr May
## 2020 2817.850902 2633.022101 290.139585 1733.058938 725.949537
## 2021 3503.819402 3827.316601 1943.008085 3225.872438 2237.517037
## 2022 6291.673902 6520.124601 4746.499585 8052.506938 5881.887037
## 2023 11911.318902 12818.901101 17391.972585 14855.561438 15358.864037
## 2024 15588.890402 15356.680101 17826.918585 15318.652438 19968.564037
## Jun Jul Aug Sep Oct
## 2020 -2.266111 983.619987 3205.245305 1321.387037 3177.900311
## 2021 2630.496389 3531.159487 5398.467305 3009.211037 6703.340811
## 2022 5892.453889 9217.103487 9675.637305 10752.377037 10540.082811
## 2023 20749.238389 17173.825487 16927.997805 18201.495037 14218.971811
## 2024 15855.575889 15172.516487 11783.086305
## Nov Dec
## 2020 3603.105397 956.336543
## 2021 6197.209897 4473.198543
## 2022 9806.672397 11829.037543
## 2023 16389.139397 19944.602043
## 2024
plot(sales_ts)
lines(seasadj(stl_sales), col ="Red")
The red line represents the seasonally adjusted time series, where the seasonal component has been removed. The difference between the actual and seasonally adjusted series shows the seasonal fluctuations in the sales data. Seasonality have big fluctuations in the value of time series.
naive_forecast <- naive(sales_ts, h = 12)
plot(naive_forecast)
Naive Model: The simplest model, forecasting that future values will be the same as the last observed value. The forecasted line is constant and flat for the next 12 months.
plot(residuals(naive_forecast) , main = "Residuals from Naïve Method",
ylab = "Residuals",
xlab = "Time")
The residual plot shows a pattern in the residuals, indicating that the Naive method does not capture the seasonality in the data.
hist(residuals(naive_forecast), main = "Residuals from Naïve Method",
xlab = "Residuals")
The histogram of residuals indicates that the residuals are spread around zero, also it is asymmetrical distribution, with some large positive and negative residuals. The residuals are not random, further indicating that the Naive method does not fully capture the underlying structure of the data.
cbind(Fitted = fitted(naive_forecast),
Residuals=residuals(naive_forecast)) %>%
as.data.frame() %>%
ggplot(aes(x=Fitted, y=Residuals)) + geom_point()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
The lack of randomness in the residuals with fitted values, confirms that the Naive method is insufficient.
cbind(Data=sales_ts,
Residuals=residuals(naive_forecast)) %>%
as.data.frame() %>%
ggplot(aes(x=Data, y=Residuals)) + geom_point()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
It shows the pattern, hence Naive method is not capturing trend or seasonality.
Acf(residuals(naive_forecast))
Several lags show positive autocorrelation at the early lags. This indicates that the residuals are not random, and there is a patternremaining in the data that the Naive method has failed to capture.
accuracy(naive_forecast)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 157.275 3029.52 2104.495 0.6371928 20.12969 0.4670533 -0.433312
forecast_nextYear <- forecast(naive_forecast, h=12)
plot(forecast_nextYear)
print(forecast_nextYear)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 2024 10151.93 6269.4469 14034.42 4214.1830 16089.68
## Oct 2024 10151.93 4661.2686 15642.60 1754.6864 18549.18
## Nov 2024 10151.93 3427.2699 16876.60 -132.5517 20436.42
## Dec 2024 10151.93 2386.9609 17916.91 -1723.5671 22027.43
## Jan 2025 10151.93 1470.4302 18833.44 -3125.2797 23429.15
## Feb 2025 10151.93 641.8232 19662.04 -4392.5248 24696.39
## Mar 2025 10151.93 -120.1596 20424.03 -5557.8769 25861.74
## Apr 2025 10151.93 -829.3959 21133.26 -6642.5603 26946.43
## May 2025 10151.93 -1495.5252 21799.39 -7661.3171 27965.18
## Jun 2025 10151.93 -2125.5659 22429.43 -8624.8813 28928.75
## Jul 2025 10151.93 -2724.8165 23028.68 -9541.3560 29845.22
## Aug 2025 10151.93 -3297.3932 23601.26 -10417.0365 30720.90
It is forecasting that future values will be the same as the last observed value. The forecasted line is constant and flat for the next 12 months. Each month forecast for the next year is set at 10151.93,
The RMSE of 3029.52 is high, showing that the Naive method is not highly accurate for this data set, especially given the evident trend and seasonality.
The time series value will remain at 10151.93 throughout the next year. Therefore, in one year (August 2025), the forecasted value remains 10151.93.
The Naive method is too simple for this time series because it fails to capture both the upward trend and the seasonal variations.
simple_ma3 <- ma(sales_ts, order = 3)
plot(sales_ts, col ="black", lwd =2)
lines(simple_ma3, col='red', lwd =2)
Simple Moving average of order three is more close to actual time series
simple_ma6 <- ma(sales_ts, order = 6)
plot(sales_ts, col ="black", lwd =2)
lines(simple_ma6, col='blue', lwd =2)
Simple Moving average of order six is relatively smooth than actual time series
simple_ma9 <- ma(sales_ts, order = 9)
plot(sales_ts, col ="black", lwd =2)
lines(simple_ma9, col='green', lwd =2)
Simple Moving average of order nine is smooth than 3 & 6 to actual time series
plot(sales_ts, col ="black", lwd =2)
lines(simple_ma3, col='red', lwd =2)
lines(simple_ma6, col='blue', lwd=2)
lines(simple_ma9, col= 'green',lwd=2)
legend("topleft", legend = c("Original", "SMA(3)", "SMA(6)", "SMA(9)"),
col = c("black", "red", "blue", "green"), lwd =2)
fore_sa <- forecast(simple_ma6, h=12)
## Warning in ets(object, lambda = lambda, biasadj = biasadj,
## allow.multiplicative.trend = allow.multiplicative.trend, : Missing values
## encountered. Using longest contiguous portion of time series
plot(fore_sa, main ='SMA forecast')
summary(fore_sa)
##
## Forecast method: ETS(M,A,N)
##
## Model Information:
## ETS(M,A,N)
##
## Call:
## ets(y = object, lambda = lambda, biasadj = biasadj, allow.multiplicative.trend = allow.multiplicative.trend)
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 0.9999
##
## Initial states:
## l = 1500.8852
## b = 38.6626
##
## sigma: 0.0212
##
## AIC AICc BIC
## 694.8967 696.2603 704.4568
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -4.52012 224.8055 155.8064 0.2345688 1.672027 0.03319346 0.2347074
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jun 2024 16570.53 16119.310 17021.75 15880.449 17260.61
## Jul 2024 16383.21 15376.421 17390.00 14843.459 17922.96
## Aug 2024 16195.89 14515.083 17876.69 13625.318 18766.46
## Sep 2024 16008.57 13553.712 18463.42 12254.190 19762.95
## Oct 2024 15821.25 12504.575 19137.92 10748.835 20893.66
## Nov 2024 15633.93 11376.364 19891.49 9122.547 22145.31
## Dec 2024 15446.61 10175.504 20717.71 7385.151 23508.06
## Jan 2025 15259.29 8906.849 21611.72 5544.073 24974.50
## Feb 2025 15071.97 7574.095 22569.83 3604.963 26538.97
## Mar 2025 14884.64 6180.025 23589.26 1572.079 28197.21
## Apr 2025 14697.32 4726.680 24667.97 -551.460 29946.11
## May 2025 14510.00 3215.466 25804.54 -2763.501 31783.51
The Simple Moving average of order six would likely be the best for forecasting this time series given its captures short and long term trends.
The moving average line becomes smoother.
ses_forecast <- ses(sales_ts, h=12)
plot(ses_forecast)
summary(ses_forecast)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = sales_ts, h = 12)
##
## Smoothing parameters:
## alpha = 0.4319
##
## Initial states:
## l = 1472.029
##
## sigma: 2603.101
##
## AIC AICc BIC
## 1110.202 1110.664 1116.279
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 521.0807 2556.194 1608.255 5.757111 15.49006 0.3569223 -0.1451411
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 2024 14075.16 10739.153 17411.17 8973.177 19177.14
## Oct 2024 14075.16 10441.301 17709.02 8517.652 19632.67
## Nov 2024 14075.16 10166.078 17984.24 8096.735 20053.59
## Dec 2024 14075.16 9908.998 18241.32 7703.564 20446.76
## Jan 2025 14075.16 9666.884 18483.44 7333.284 20817.04
## Feb 2025 14075.16 9437.393 18712.93 6982.307 21168.01
## Mar 2025 14075.16 9218.734 18931.59 6647.897 21502.42
## Apr 2025 14075.16 9009.505 19140.82 6327.909 21822.41
## May 2025 14075.16 8808.582 19341.74 6020.623 22129.70
## Jun 2025 14075.16 8615.047 19535.27 5724.637 22425.68
## Jul 2025 14075.16 8428.141 19722.18 5438.789 22711.53
## Aug 2025 14075.16 8247.226 19903.09 5162.104 22988.22
The SES forecast is a horizontal line for the future, aa it captures the level of the series without accounting for trend or seasonality
alpha = 0.4319 The model moderately gives weigtage to recent data.
Level (l = 1472.029) The start of the forecast period.
Sigma (2603.101) represents the standard deviation of the residuals
plot(ses_forecast$residuals , main = "Residuals from SES Method",
ylab = "Residuals",
xlab = "Year")
It is showing peaks and trough which means data is skewed. #### Do a Histogram plot of residuals. What does the plot indicate?
hist(ses_forecast$residuals, main = "Residuals from SES Method",
xlab = "Residuals")
The histogram of residuals indicates that the residuals are spread around zero, also it is asymmetrical distribution, The histogram appears to be skewed on one side. #### Do a plot of fitted values vs. residuals. What does the plot indicate?
cbind(Fitted = fitted(ses_forecast),
Residuals=residuals(ses_forecast)) %>%
as.data.frame() %>%
ggplot(aes(x=Fitted, y=Residuals)) + geom_point()
cbind(Data = sales_ts,
Residuals=residuals(ses_forecast)) %>%
as.data.frame() %>%
ggplot(aes(x=Data, y=Residuals)) + geom_point()
Acf(ses_forecast$residuals)
accuracy(ses_forecast)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 521.0807 2556.194 1608.255 5.757111 15.49006 0.3569223 -0.1451411
fore_ses <- forecast(ses_forecast, h=12)
plot(fore_ses)
print(fore_ses)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 2024 14075.16 10739.153 17411.17 8973.177 19177.14
## Oct 2024 14075.16 10441.301 17709.02 8517.652 19632.67
## Nov 2024 14075.16 10166.078 17984.24 8096.735 20053.59
## Dec 2024 14075.16 9908.998 18241.32 7703.564 20446.76
## Jan 2025 14075.16 9666.884 18483.44 7333.284 20817.04
## Feb 2025 14075.16 9437.393 18712.93 6982.307 21168.01
## Mar 2025 14075.16 9218.734 18931.59 6647.897 21502.42
## Apr 2025 14075.16 9009.505 19140.82 6327.909 21822.41
## May 2025 14075.16 8808.582 19341.74 6020.623 22129.70
## Jun 2025 14075.16 8615.047 19535.27 5724.637 22425.68
## Jul 2025 14075.16 8428.141 19722.18 5438.789 22711.53
## Aug 2025 14075.16 8247.226 19903.09 5162.104 22988.22
The SES forecast is constant over time, reflecting overall average based on recent data, with no upward or downward trend.
Accuracy metrics indicate that SES provides moderate accuracy
The SES forecast predicts a constant value of 14075.16 for each month over the next year
SES provides a simple baseline forecast, it unsuitable for time series data with trend and seasonal components.
hw_forecast <- hw(sales_ts, h=12)
plot(hw_forecast)
Holt-Winters model is forecasting both trend and seasonal effects for the next 12 months.
summary(hw_forecast)
##
## Forecast method: Holt-Winters' additive method
##
## Model Information:
## Holt-Winters' additive method
##
## Call:
## hw(y = sales_ts, h = 12)
##
## Smoothing parameters:
## alpha = 0.3176
## beta = 0.0862
## gamma = 1e-04
##
## Initial states:
## l = 818.4249
## b = 343.2258
## s = 1774.076 -1565.1 -1259.728 1059.152 -814.7662 753.5157
## 2154.785 120.2898 -57.6465 1068.374 -1562.394 -1670.559
##
## sigma: 2428.542
##
## AIC AICc BIC
## 1113.622 1129.728 1148.053
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -210.3775 2052.493 1661.449 -3.401198 33.49673 0.3687275
## ACF1
## Training set -0.02495954
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 2024 14915.453 11803.1507 18027.75 10155.5976 19675.31
## Oct 2024 11924.642 8568.2064 15281.08 6791.4168 17057.87
## Nov 2024 10947.345 7260.7678 14633.92 5309.2115 16585.48
## Dec 2024 13614.601 9515.1067 17714.09 7344.9656 19884.24
## Jan 2025 9498.231 4909.7127 14086.75 2480.6981 16515.76
## Feb 2025 8934.255 3788.3217 14080.19 1064.2291 16804.28
## Mar 2025 10892.902 5128.5202 16657.28 2077.0406 19708.76
## Apr 2025 9094.772 2657.3234 15532.22 -750.4562 18940.00
## May 2025 8601.422 1441.6443 15761.20 -2348.5131 19551.36
## Jun 2025 9963.072 2036.0955 17890.05 -2160.1923 22086.34
## Jul 2025 7890.275 -845.1816 16625.73 -5469.4528 21250.00
## Aug 2025 5649.690 -3932.5846 15231.97 -9005.1345 20304.52
Alpha = 0.3176 the model places moderate weight on recent data for the level component
beta = 0.0862 It is relatively low hence trend is slower
gamma = 0.0001 low gamma value suggests minimal smoothening for seasonality
level l = 818.4249 start point of the smoothed series at the beginning of the forecast period.
Trend b = 343.2258 This initial trend value seasonality (s) - values represent the starting seasonal indices for each month
sigma = 2428.542 the standard deviation of residuals
plot(hw_forecast$residuals , main = "Residuals from hw_forecast Method",
ylab = "Residuals",
xlab = "Time")
hist(hw_forecast$residuals, main = "Residuals from hw_forecast Method",
xlab = "Residuals")
cbind(Fitted = fitted(hw_forecast),
Residuals=residuals(hw_forecast)) %>%
as.data.frame() %>%
ggplot(aes(x=Fitted, y=Residuals)) + geom_point()
cbind(Data = sales_ts,
Residuals=residuals(hw_forecast)) %>%
as.data.frame() %>%
ggplot(aes(x=Data, y=Residuals)) + geom_point()
Acf(hw_forecast$residuals)
accuracy(hw_forecast)
## ME RMSE MAE MPE MAPE MASE
## Training set -210.3775 2052.493 1661.449 -3.401198 33.49673 0.3687275
## ACF1
## Training set -0.02495954
hw_fore <- forecast(hw_forecast, h=12)
plot(hw_fore)
print(hw_fore)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 2024 14915.453 11803.1507 18027.75 10155.5976 19675.31
## Oct 2024 11924.642 8568.2064 15281.08 6791.4168 17057.87
## Nov 2024 10947.345 7260.7678 14633.92 5309.2115 16585.48
## Dec 2024 13614.601 9515.1067 17714.09 7344.9656 19884.24
## Jan 2025 9498.231 4909.7127 14086.75 2480.6981 16515.76
## Feb 2025 8934.255 3788.3217 14080.19 1064.2291 16804.28
## Mar 2025 10892.902 5128.5202 16657.28 2077.0406 19708.76
## Apr 2025 9094.772 2657.3234 15532.22 -750.4562 18940.00
## May 2025 8601.422 1441.6443 15761.20 -2348.5131 19551.36
## Jun 2025 9963.072 2036.0955 17890.05 -2160.1923 22086.34
## Jul 2025 7890.275 -845.1816 16625.73 -5469.4528 21250.00
## Aug 2025 5649.690 -3932.5846 15231.97 -9005.1345 20304.52
The Holt-Winters additive method is a time series forecasting model that accounts for level, trend, and seasonality.
the Holt-Winters method shows moderate accuracy, capturing both trend and seasonality.
The model provides monthly forecasts for the next year, with values that vary due to seasonality August 2025 - 5649.69
The Holt-Winters additive method provides a well-rounded forecast, capturing the trend and seasonal patterns observed in the data
accuracy_naive <- accuracy(naive_forecast)
accuracy_ses <- accuracy(ses_forecast)
accuracy_hw <- accuracy(hw_forecast)
accuracy_table <- data.frame(
Model = c("Naive", "SES", "Holt-Winters"),
RMSE = c(accuracy_naive["Training set", "RMSE"], accuracy_ses["Training set", "RMSE"], accuracy_hw["Training set", "RMSE"]))
accuracy_table$Model_Rank <- rank(accuracy_table$RMSE)
print(accuracy_table)
## Model RMSE Model_Rank
## 1 Naive 3029.520 3
## 2 SES 2556.194 2
## 3 Holt-Winters 2052.493 1
The Holt-Winters will be most suitable for this and naive and ses would not be sutable as it does nto capture trend and seasonality.
Based on the analysis, the time series is expected to stay relatively flat, with seasonal fluctuations, over the next year and likely the next two years.
If the current trend continues, sales are likely to continue increasing over the next two years. The upward trend suggests sustained growth, and the seasonal patterns are expected to remain consistent unless significant market changes occur.
accuracy_table$Model_Rank <- rank(accuracy_table$RMSE)
print(accuracy_table)
## Model RMSE Model_Rank
## 1 Naive 3029.520 3
## 2 SES 2556.194 2
## 3 Holt-Winters 2052.493 1